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o 
o 

(N 



in 



(N 

> 
in 

00 

o 

cn 

o 
o 



Andrey V. Kravtsov\ Andreas A. Berlind^ ", Risa H. Wechsler^ "'', Anatoly A. Klypin'', 
Stefan Gottlober^ Brandon Allgood^', Joel R. Primack" 

Draft version February 2, 2008 

ABSTRACT 

We analyze the halo occupation distribution (HOD) and two-point correlation function of galaxy-size 
dark matter halos using high-resolution dissipationless simulations of the concordance flat ACDM model. 
The halo samples include both the host halos and the subhalos, distinct gravitationally-bound halos 
within the virialized regions of larger host systems. We find that the HOD, the probability distribution 
for a halo of mass M to host a number of subhalos N, is similar to that found in semi-analytic and 
A-body+gasdynamics studies. Its first moment, (A)j\/, has a complicated shape consisting of a step, 
a shoulder, and a power-law high-mass tail. The HOD can be described by Poisson statistics at high 
halo masses but becomes sub-Poisson for {N)m ^ 4. We show that the HOD can be understood as 
a combination of the probability for a halo of mass M to host a central galaxy and the probability to 
host a given number Ns of satellite galaxies. The former can be approximated by a step-like function, 
while the latter can be well approximated by a Poisson distribution, fully specifled by its first moment. 
The first moment of the satellite HOD can be well described by a simple power law (Ns) oc with 
/3 « 1 for a wide range of number densities, redshifts, and different power spectrum normalizations. 
This formulation provides a simple but accurate model for the halo occupation distribution found in 
simulations. At z = 0, the two-point correlation function (CF) of galactic halos can be well fit by a 
power law down to ^ lOOft.^"'^ kpc with an amplitude and slope similar to those of observed galaxies. 
The dependence of correlation amplitude on the number density of objects is in general agreement with 
results from the SDSS survey. At redshifts > 1, we find significant departures from the power-law 
shape of the CF at small scales, where the CF steepens due to a more pronounced one-halo component. 
The departures from the power law may thus be easier to detect in high-redshift galaxy surveys than at 
the present-day epoch. They can be used to put useful constraints on the environments and formation of 
galaxies. If the deviations are as strong as indicated by our results, the assumption of the single power 
law often used in observational analyses of high-redshift clustering is dangerous and is likely to bias the 
estimates of the correlation length and slope of the correlation function. 

Subject headings: cosmology: theory-galaxies: formation- galaxies: halos-large-scale structure of 
universe 



1. INTRODUCTION 

Understanding the processes that drive galaxy cluster- 
ing has always been one of the main goals of observa- 
tional cosmology. In particular, the physical explanation 
for the approximately power-law shape of the galaxy two- 
point correlation function (e.g., Peebles 1980, and refer- 
ences therein) is still an open problem. High-resolution 
cosmological simulations over the past decade have shown 
that on small scales (< 1 — 2 Mpc) the correlation func- 
tion of matter strongly deviates from the power-law shape. 
The direct implication of this result is that the spatial 
distribution of galaxies on small scales is biased with re- 
spect to the overall distribution of matter in a non-trivial 
scale-dependent way (e.g., Klypin et al. 1996; Jenkins et al. 
1998). In view of this, it is very interesting to understand 
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whether the power-law shape of the correlation function is 
a fortuitous coincidence or a consequence of some funda- 
mental physical process. 

The physics of galaxy formation, which almost certainly 
plays a role in determining how galaxies of different types 
and luminosities are clustered, is complicated and still 
rather poorly understood. Galaxy mergers, gas cooling, 
and star formation are just a few of the many processes 
that can potentially affect the clustering statistics of a 
galaxy sample. Nevertheless, despite the apparent com- 
plexity, there is evidence that gravitational dynamics alone 
may explain the basic features of galaxy clustering, at least 
in the simple case of galaxies selected above a luminosity 
or mass threshold. Building on several pioneering studies 
(e.g., Carlberg 1991; Brainerd & Villumsen 1992, 1994a,b; 
Colin et al. 1997), Kravtsov & Klypin (1999) and Colin 
et al. (1999) used high-resolution A-body simulations that 
resolved both isolated halos and dark matter substructure 
within virialized halos to show that the correlation func- 
tion of galactic halos has a power-law shape with an am- 
plitude and slope similar to those measured in the APM 
galaxy catalog (Baugh 1996). More recently, Neyrinck 
et al. (2003) showed that dark matter subhalos identified 
in a different set of dissipationless simulations have a cor- 
relation function and power spectrum that matches that 
of the galaxies in the PSCz survey. 

These results suggest that the spatial distribution of 
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galaxies can be explained to a large extent simply by asso- 
ciating galaxies brighter than a certain luminosity thresh- 
old with dark matter halos more massive than a certain 
mass corresponding to that threshold. In practice, how- 
over, wc can expect a considerable band-dependent scatter 
between galaxy luminosity and halo mass. The scatter in 
general needs to be accounted for in the model. 

Although the power spectrum and correlation functions 
provide a relatively simple and useful measure of galaxy 
clustering, the implications for the physics of galaxy for- 
mation are often difficult to extract using these statistics 
alone. The halo occupation distribution (HOD) formal- 
ism, developed during the last several years, is a power- 
ful theoretical framework for predicting and interpreting 
galaxy clustering. The formalism describes the bias of 
a class of galaxies using the probability P{N\M) that a 
halo of virial mass M contains N such galaxies and addi- 
tional proscriptions that specify the relative distribution 
of galaxies and dark matter within halos. If, as theoret- 
ical models seem to predict (Bond ct al. 1991; Lemson 
& Kauffmann 1999; Berlind et al. 2003), the HOD at a 
fixed halo mass is statistically independent of the halo's 
large-scale environment, this description of galaxy bias is 
essentially complete. Given the HOD, as well as the halo 
population predicted by a particular cosmological model, 
one can calculate any galaxy clustering statistic at both 
linear and highly non-linear scales. In addition, the HOD 
can be more easily related to the physics of galaxy forma- 
tion than most other statistics. 

Several aspects of the HOD model have been studied 
using semi-analytic galaxy formation models (Kauffmann 
et al. 1997; Governato et al. 1998; Jing et al. 1998; Kauff- 
mann et al. 1999a, b; Benson ct al. 2000b, a; Shcth & Di- 
aferio 2001; Somerville et al. 2001; Wechsler et al. 2001; 
Berlind et al. 2003) and cosmological gasdynamics simu- 
lations (White et al. 2001; Yoshikawa et al. 2001; Pearce 
et al. 2001; Berhnd et al. 2003). Berlind et al (2003) 
present a detailed comparison of the HOD in a semi-analytic 
model and gasdynamics simulations. They find that, de- 
spite radically different treatments of the cooling, star 
formation, and stellar feedback in the two approaches to 
galaxy formation modeling, for galaxy samples of the same 
space density the predicted HODs are in almost perfect 
agreement. This result lends indirect support to the idea 
that the HOD, and hence galaxy clustering, is driven pri- 
marily by gravitational dynamics rather than by processes 
such as cooling and star formation. It is therefore in- 
teresting to study the HOD that is predicted in purely 
dissipationless cosmological simulations. The probability 
distribution, P{N\M), in this case is measuring the proba- 
bility for an isolated halo of mass M to contain subhalos 
within its virial radius. As the observational constraints 
on the HOD and its evolution improve, the predictions of 
the halo HOD can be compared to the HOD of galaxies in 
order to determine to what extent gravity alone is respon- 
sible for galaxy clustering. 

In this paper we use high resolution dissipationless simu- 
lations of the concordance ACDM model to study the HOD 
of dark matter halos and its evolution. The paper is orga- 
nized as follows: in § 2 and § 3 we describe the simulations 
and the halo identification algorithm that we use. In § 4 
we describe the halo samples used in our analyses. In § 5 
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ACDMso 


1.0 
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1.07 X 10" 


1.9 


ACDMso 


0.75 
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512^ 


3.16 X 10* 
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wc review the main features of the HOD formalism and the 
associated halo model of dark matter clustering. In § 6.1 
we present HOD predictions for dark matter substructure 
and in § 6.2 we show the corresponding predictions for the 
two-point correlation function. In § 7 and § 8 we discuss 
and summarize our results. 

2. SIMULATIONS 

We analyze the halo occupation distribution and cluster- 
ing in the concordance fiat ACDM model: fio = 1 — f^A = 
0.3, h = 0.7, where VLq and ^^a are the present-day matter 
and vacuum densities, and h is the dimensionless Hub- 
ble constant defined as _ffo = 100ft. km s~^Mpc~^. This 
model is consistent with recent observational constraints 
(e.g., Spergel et al. 2003). To study the effects of the 
power spectrum normalization and resolution we consider 
two simulations of the ACDM cosmology. The first simu- 
lation followed the evolution of 256^ « 1.67 x 10^ particles 
m a was normalized 

to CTs = 1.0, where erg is the rms fiuctuation in spheres of 
comoving radius. This simulation was used pre- 
viously to study the halo clustering and bias by Kravtsov 
& Klypin (1999) and Colin et al. (1999) and wc refer 
the reader to these papers for further numerical details. 
This simulation was also used to study halo concentrations 
(Bullock ct al. 2001b), the specific angular momentum dis- 
tribution (Bullock et al. 2001a), and the accretion history 
of halos (Wechsler et al. 2002). The second simulation fol- 
lowed the evolution of 512^ « 1.34 x 10^ particles in the 
same cosmology, but in a 8O/1-1 Mpc » 114.29 Mpc box 
and with a power spectrum normalization of erg = 0.75. 
This normalization is suggested by several recent measure- 
ments (e.g., Borgani et al. 2001; Pierpaoli et al. 2001; La- 
hav et al. 2002; Schuecker et al. 2003; Jarvis et al. 2003). 

The simulations were run using the Adaptive Refine- 
ment Tree A^-body code (ART; Kravtsov ct al. 1997; Kravtsov 
1999). The ART code reaches high force resolution by re- 
fining all high-density regions with an automated refine- 
ment algorithm. The refinements are recursive: the refined 
regions can also be refined, each subsequent refinement 
having half of the previous level's cell size. This creates 
an hierarchy of refinement meshes of different resolution 
covering regions of interest. The criterion for refinement 
is the mass of particles per cell. In the ACDMgo the code 
refined an individual cell only if the mass exceeded nth — 5 
particles independent of the refinement level. In terms of 
overdensity, this means that all regions with overdensity 
higher than 8 = nth 2^^/n, where n is the average number 
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density of particles in the cube, were refined to the refine- 
ment level L. Thus, for the ACDMeo simulation , n is 1/8. 
The peak formal dynamic range reached by the code in this 
simulation is 32, 768, which corresponds to the peak formal 
resolution (the smallest grid cell) of ft-pcak — 1.83h~^ kpc; 
the actual force resolution is « 2/ipeak = 3.7/i~^ kpc (see 
Kravtsov et al. 1997). In the higher-resolution ACDMgo 
simulations the refinement criterion was level- and time- 
dependent. At the early stages of evolution (a < 0.65) the 
thresholds wore sot to 2, 3, and 4 particle masses for the 
zeroth, first, and second and higher levels, respectively. 
At low redshifts, a > 0.65, the thresholds for these refine- 
ment levels were set to 6, 5, and 5 particle masses. The 
lower thresholds at high redshifts were set to ensure that 
collapse of small-mass halos is followed with higher res- 
olution. The maximum achieved level of refinement was 
i'max = 8, which corresponds to the comoving cell size 
of 1.22h~^ kpc. As a function of redshift the maximum 
level of refinement was equal to Lmax = 6 for 5 < z < 7, 
.^^max = 7 for 1 < z < 5, I/max > 8 for z < 1. The peak 
formal resolution was /ipeak < 1.2/i~^ kpc (physical). The 
parameters of the simulations are summarized in Table 1. 

3. HALO IDENTIFICATION 

Identification of DM halos in the very high-density envi- 
ronments of groups and clusters is a challenging problem. 
The goal of this study is to investigate the halo occupation 
distribution and clustering of the overall halo population. 
Therefore, we need to identify both host halos with cen- 
ters that do not lie within any larger virialized system and 
subhalos located within the virial radii of larger systems. 
Below we use the terms satellites, subhalos, and substruc- 
ture interchangeably. 

To identify halos and the subhalos within them we use 
a variant of the Bound Density Maxima (BDM) halo find- 
ing algorithm Klypin et al. (1999, hereafter KGKK). We 
start by calculating the local overdensity at each particle 
position using the SPH smoothing kernel of 24 particles. 
The number of kernel particles roughly corresponds to the 
lowest halo mass that we hope to identify. We then sort 
particles according to their overdensity and use all parti- 
cles with S > Synin = 2000 as potential halo centers. The 
specific value of iJmin was chosen after experimentation to 
ensure completeness of the halo catalogs on the one hand 
while maximizing the efficiency of the halo finder. 

Starting with the highest overdensity particle, we sur- 
round each potential center by a sphere of radius rfind = 
50h~^ kpc and exclude all particles within this sphere from 
further center search. The search radius is defined by the 
size of smallest systems we aim to identify. We checked 
that results do not change if this radius is decreased by a 
factor of two. After all potential centers are identified, we 
analyze the density distribution and velocities of surround- 
ing particles to test whether the center corresponds to a 
gravitationally bound clump. Specifically, we construct 
density, circular velocity, and velocity dispersion profiles 
around each center and itoratively remove unbound parti- 
cles using the procedure outlined in Klypin et al. (1999). 
We then construct final profiles using only bound parti- 
cles and use them to calculate properties of halos such as 

^ To calculate the density we use the publicly available code smooth: 
http : / /www-hpcc . astro . Washington . edu/tools/tools . html 



maximum circular velocity Vmax, mass M, etc. 

The virial radius is meaningless for the subhalos within 
a larger host as their outer layers are tidally stripped. 
The definition of the outer boundary of a subhalo and its 
mass are thus somewhat ambiguous. We adopt the trunca- 
tion radius, rt, at which the logarithmic slope of the den- 
sity profile constructed from the bound particles becomes 
larger than —0.5, as we do not expect the density profile of 
the CDM halos to be flatter than this slope. Empirically, 
this definition roughly corresponds to the radius at which 
the density of the gravitationally bound particles is equal 
to the background host halo density, albeit with a large 
scatter. For some halos rt is larger than their virial radius. 
In this case, we set rt — i?vir- For each halo we also con- 
struct the circular velocity profile Vdrc (r) = \/GM{< r)/r 
and compute the maximum circular velocity Vmax- 

Figure 1 shows the particle distribution in the most mas- 
sive halos identified at z = and z = 3 along with the ha- 
los (circles) identified by the halo finder. The particles are 
color-coded on a grey scale according to the logarithm of 
their density to enhance visibility of substructure clumps. 
The radius of circles is proportional to the halo's Knax- 
The figure shows that the algorithm is efficient in identi- 
fying substructure down to small masses. Note that the 
smallest halos plotted in Fig. 1 have masses smaller than 
our completeness limit of w 50 particles. This approximate 
limit corresponds to the mass below which cumulative 
mass and velocity functions start to flatten signiflcantly. 
In the following analysis, wc will consider only halos with 
masses M > 50rMp (corresponding to 1.6 x 10^ Mq 
and 5.4 X 10^° h''^ Mq in the ACDMgo and ACDMgo boxes 
respectively). 

To classify the halos, we calculate the formal boundary 
of each object as the radius corresponding to the enclosed 
overdensity of 180 with respect to the mean density around 
its center. The halos whoso center is located within the 
boundary of a larger mass halo we call subhalos or satel- 
lites. The halos that are not classified as satellites are 
identified as host halos. Note that the center of a host 
halo is not considered to be a subhalo. Thus, host halos 
may or may not contain any subhalos with circular velocity 
above the threshold of a given sample. The host centers, 
however, are included in clustering statistics because we 
assume that each host harbors a central galaxy at its cen- 
ter. Therefore, the total sample of galactic halos contains 
central and satellite galaxies. The former have positions 
and maximum circular velocities of their host halos, while 
the latter have positions and K„ax of subhalos. 

In the observed universe, the analogy is simple. The 
Milky Way, for example, would be the central galaxy in a 
host halo of mass Mh ~ lO^^/i"^ M© because its center is 
not within any larger virialized system.^ The host halo of 
the Galaxy contains a number of satellites, which would 
or would not be included in a galaxy sample depending on 
how deep the sample is. In a rich cluster, the brightest 
cluster galaxy that typically resides near the cluster cen- 
ter would be associated with the cluster host halo in our 
terminology. All other galaxies within the virial radius 
of the cluster would be considered "satellites" associated 
with subhalos. 

* Note that the Local Group is not virialized and the Milky Way 
and Andromeda reside in two independent host halos. 



Fig. 1. — Distribution of dark matter particles (points) and dark matter halos (circles) identified by our halo finding algorithm centered on 
the most massive halo in the ACDMgo simulation at 2 = 3 (left) and z = (right). The radius of the largest circle indicates the actual virial 
radius, -R18O, of the most massive halo (-R180 = 0.67h~^ comoving Mpc at 2: = 3 and R180 = 2.1h~^ Mpc at 2 = 0); the radii of all other 
halos are scaled using the halo' maximum circular velocities (rh = 0.65Vinax kpc with Knax in km s~^). 



4. HALO SAMPLES 

To construct a halo catalog, we have to define selection 
criteria based on particular halo properties. Halo mass is 
usually used to define halo catalogs (e.g., a catalog can be 
constructed by selecting all halos in a given mass range). 
However, the mass and radius are very poorly defined for 
the satellite halos due to tidal stripping which alters a 
halo's mass and physical extent (see KGKK). Therefore, 
wc will use maximum circular velocity Vinax as a proxy 
for the halo mass. This allows us to avoid complications 
related to the mass and radius determination for satellite 
halos. Moreover, when a halo gets stripped 14nax changes 
less dramatically than the mass, and is therefore a more 
robust "label" of the halo. For isolated halos, Knax and 
the halo's virial mass are directly related. For the suhalos 
14iax will experience secular decrease but at a relatively 
slow rate. 

Instead of selecting objects in a given range of Knax, at 
each epoch we will select objects of a fixed set of number 
densities corresponding to (redshift-dependent) thresholds 
in maximum circular velocity: ni(> V^max)- Note that the 
number density here includes all the centers of the isolated 
host halos and the subhalos within the hosts (see eq. 21 in 
§ 7.3). 

The threshold selection is somewhat arbitrary, except 
that the limited box size puts a lower limit on the num- 
ber densities wc can consider. The completcniess limit of 
our catalogs imposes an upper limit on the number densi- 
ties we can consider. The number densities probed in this 
simulation span a representative range of galaxy number 
densities. We chose to focus on a set of number densities 
corresponding to a representative set of luminosity cuts 
for SDSS galaxies. Namely, we use the Schechter fit to the 



SDSS r-band luminosity function presented by Blanton 
et al. (2003) and select the set of galaxy number densi- 
ties corresponding to the absolute magnitude thresholds 
Mr = -16, -18, -19, -20, and -21 (the magnitudes 
quoted are M — 51og/i). The number densities and corre- 
sponding numbers of galactic halos in the analyzed simu- 
lations are listed in Table 2. Note that the median redshift 
of galaxies in the sample of Blanton et al. (2003) is z = 0.1, 
while we use simulation outputs at 2; = 0. However, using 
halos at the z = 0.1 output instead of z=0.0 results in only 
2% change in the values of threshold T^nax- 

Figure 2 shows the maximum circular velocity of the 
halos in our simulations with the same number density as 
the SDSS galaxies with a given Mr. For comparison the 
dotted lines show the power law luminosity-circular ve- 
locity relation, Lr oc V^ax' with a = 7 and a = 3. Note 
that the relation does not have a power law form at any 
circular velocity. For 100 < I^nax < 200 km s~^, the slope 
is a « 3, while for Knax < 100 km s"""^ the slope is much 
steeper: a « 7. Such steepening is of course required to 
match the shallow faint-end of the galaxy luminosity func- 
tion with the relatively steep circular velocity function. At 
^ax ^ 300km s~^, the relation becomes shallow because 
the number density of halos at these masses is dominated 
by the central "galaxies" which are assigned maximum cir- 
cular velocity of their group- or cluster-size host halo. The 
overall shape of the Mr — Vmax relation thus likely reflects 
the non-monotonic dependence of the mass-to-light ratio 
M\^/L on the host mass (e.g., Benson et al. 2000b; van 
den Bosch et al. 2003). The detailed comparisons with 
the observed Tully-Fisher, however, require more detailed 
modeling which takes into account effects of baryon cooling 
on Knax- At Knax > 300 km s~^, the maximum circular 
velocity is measured for the host group and cluster-size 
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Fig. 3. Bottom panels: cumulative mass functions of the halo samples (left panel: ACDMgOi right panel: ACDMgo) used in our analysis 
at different redsliifts. Note that the number density here includes all the centers of the isolated host halos and the subhalos located within 
the hosts (sec cq. 21 in § 7.3). The horizontal dotted lines indicate the number density thresholds adopted in our analysis. The curves were 
plotted down to the minimum halo mass of 50 particles. Top panels: the fraction of halos with masses larger than M classifed as subhalos: 
/sub = (n - "host)/"- 
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Fig. 2. — The maximum circular velocity of halos in the 
60/1-1 ]y[pp (dashed line) and 80h~^ Mpc [solid line) ACDM simu- 
lations vs. the r-band absolute magnitude of the SDSS galaxies of 
the same number density. The curves are obtained by matching the 
cumulative velocity functions ra(> Knax) (at z = 0) to the SDSS 
luminosity function n(< Mr). The dashed lines show the power- 
law luminosity-circular velocity relation, Lr oc V^ax! for a = 7 and 
a = 3. 



systems, rather than for the central object, as it is impos- 
sible to unambiguously separate the central object from 
the host group in dissipationless simulations. 

Figure 3 shows the cumulative mass functions for the 
ACDMeo and ACDMgo simulations at the analyzed epochs. 
The functions include both centers of the host halos and 
subhalos and are only plotted down to the mass corre- 
sponding to 50 particles, below which our halo catalogs 
are incomplete. The horizontal dotted lines show the num- 
ber density thresholds adopted for the subsequent analy- 
sis. Note that at high redshifts the halo catalogs are in- 
complete at the highest number densities. The figure also 
shows the fraction of n(> M) that is in objects classified 
as subhalos. Note that in our samples of galaxy-size ha- 
los, about 15 — 25% of all the halos arc in subhalos at 
any epoch. This is comparable to the observed fraction of 
~ 20% of galaxies located in groups and clusters. 

5. THE HALO MODEL 

The description of different elements of the halo model 
can be found in several recent papers (e.g., Seljak 2000; 
Ma & Fry 2000; Peacock & Smith 2000; Scoccimarro et al. 
2001; Sheth et al. 2001a,b; Berhnd & Weinberg 2002; Cooray 
& Sheth 2002; Yang et al. 2003). The model has quickly 
proven to be a very convenient analytic formalism for pre- 
dicting and interpreting the nonlinear clustering of dark 
matter and galaxies. The main idea behind the model 
is that all dark matter is bound up in halos that have 
well- understood properties. The dark matter distribution 
is then fully specified by 1) the halo mass function, 2) 
the linear bias of halos as a function of halo mass, and 



6 



Table 2 
Number density thresholds 



n 

Mpc"^ 


SDSS Mr 


-■'halo 


Ar60 
-■'halo 


5.86 X 10"^ 


-16 


30003 


12657 


2.79 X 10^2 


-18 


14285 


6026 


1.52 X 10"^ 


-19 


7782 


3282 


5.89 X 10"=^ 


-20 


3016 


1272 


1.11 X 10"^ 


-21 


568 


240 



3) the radial density profiles of halos as a function of halo 

mass. These three elements have been relatively well stud- 
ied using N-body simulations and they can be computed 
analytically, given a cosmological model. In order to spec- 
ify the galaxy distribution, two additional ingredients are 
required: 4) the probability distribution P{N\M) that a 
halo of virial mass M contains TV galaxies and 5) the rela- 
tive distribution of galaxies and dark matter within halos. 
These last two elements are called the halo occupation dis- 
tribution (HOD) and they are only now becoming the fo- 
cus of much attention both theoretically and observation- 
ally (Berlind ct al. 2003 and references therein). P{N\M) 
is the most important piece of the HOD in terms of its 
effect on galaxy clustering and it is the main focus of this 
study. We refer to this element when we use the term 
HOD henceforth. 

In the halo model the two-point c;orrclation function of 
the galaxy distribution is a sum of two terms: the "1-halo" 
term due to galaxy pairs within a single halo and the "2- 
halo" term due to pairs in separate distinct halos: 

At scales larger than the virial diameter of the largest ha- 
los, all pairs consist of galaxies in separate halos ^ 
^^^), while at smaller scales most pairs consist of galax- 
ies within the same halo {^^^ ^^^)- The two terms are 
given by 

1 
2 



■J n{M){N{N-l))MX{r\M)dM; (2) 



I 



n{M2)bh{M2){N)M,X{r\MuM2)dM2 (3) 



where fig is the mean number density of galaxies in the 
sample, n(M) is the halo mass function, bh{M) is the 
large-scale linear bias of halos, X{r\M) is the convolution 
of the radial profile of galaxies within halos with itself, 
A(r|Mi, M2) is the convolution of two different radial pro- 
files, and ^^"Jjj (r) is the linear dark matter correlation func- 
tion (also see Sheth et al. 2001b, a; Berlind & Weinberg 
2002). The integration is over the mass limit correspond- 
ing to the galaxy sample under consideration. 

On large scales, the 2-halo term reduces to Cgg(r) = 
6^^Jjj'Jjj(r), where b is the large-scale bias factor of galaxies. 



Equations 2 and 3 represent the halo model in its most 
basic form, but variations do exist (see, e.g., Zehavi et al. 
2003; Magliocchetti & Porciani 2003). The halo model is 
most easily applied in Fourier space because the calcula- 
tion of the correlation function in real space involves con- 
volutions, which turn into multiplications in Fourier space. 
For our purposes, however, it suffices to express galaxy 
clustering in terms of the real-space correlation function. 

Equations 2 and 3 show that depends on the average 
number of galaxies per halo of a given mass, 

(A^)M=$]iVP(JV|M), (4) 

N 

while depends on the second moment of P{N\M) 

{N{N -1))m = Y,N{N -l)P{N\M). (5) 

JV 

Higher order correlations depend on the higher order mo- 
ments of P{N\M). Both the mean {N)m and the shape 
of P{N\M) are thus key components of the halo model. 

For galaxy samples defined by a minimum luminosity 
threshold, the mean halo occupation {N)m is usually as- 
sumed to be a power law at high halo masses {M > 10^^ 
M0). This is supported both by theoretical models 
(Berlind et al. 2003 and references therein) and by halo 
model fits to observational data (e.g., Scoccimarro et al. 
2001; Zehavi et al. 2003; Magliocchetti & Porciani 2003). 
At low halo masses, {N)m is expected to reach a plateau 
{N) ^ 1, where each halo contains on average only one 
galaxy, and then cut off below a minimum mass threshold. 

An alternative approach is to assume the existence of 
two separate galaxy populations: central halo galaxies 
(zero or one per halo) and satellite galaxies within halos. 
These populations can then be modeled separately (e.g., 
Guzik & Seljak 2002). In particular, in the most simple 
case the HOD of central galaxies can be modeled as a step 
function {Nc)m = 1 with (iVc)Af = for M < M^in, 
while the HOD of the satellite galaxies can be modeled as 
a power law, {Ns)m oc . The motivation for the sepa- 
ration of central and satellites galaxies comes partly from 
the analysis of hydrodynamic simulations (Berlind et al. 
2003), and partly from studies of central bright elliptical 
galaxies in groups and clusters, which are often considered 
as a separate population from the rest of galaxies. As we 
show below, such a separation greatly simplifies the HOD 
analysis. 

Several simple distributions have been considered for the 
shape of the HOD. The Poisson distribution is fully spec- 
ified by its first moment {N), as the high-order moments 
are simply 

{N{N-l)...{N-j)) = {Ny+\ (6) 
For the nearest mfe^er distribution with Ni < (N) < Ni+1 
(where Ni is an integer), the second and third moments 
are 

{N{N-l)) = {Nf{l+^2) 
(N{N - 1){N - 2)) = (Ar)3 (1 + 36 + 6) , (7) 
where the volume- averaged connected correlations, ^(M) 
and ^3{M), are (Berlind et al. 2003) 

NijNi + l) 2Ni 
{N)' {N) 

- ^ 2Ni{Nf - 1) mf 

(iV)3 ^ (Ar)2 (AT) 
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1, 



+ 2. 



(8) 
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For the binomial distribution 

PiN = n\M) = -m)^'^-", (9) 

with mean {N)m = J^mPm, the second moment is {N{N — 
1))m = J^mPm{J^mPm — Pm) and the higher-order mo- 
ments are given by 

{N{N - l)...{N-j)) = c?{2o? - - j + \){Ny^\ 

(10) 

where the parameter a is defined as 

al, EE (7V(iV _ l)),,/(7V)2,. (11) 

The function oi\,f is a convenient measure of how differ- 
ent P(A^|M) is from the Poisson distribution, for which 
alj = 1. For distributions narrower than the Poisson 
oq^j < 1, while for broader distributions a\.j > 1. Semi- 
analytic models and hydrodynamic simulations predict a 
significantly sub-Poisson P{N\M) distribution at low (N) 
(Berlind et al. 2003 and references therein). Moreover, it 
has been shown that a sub-Poisson P{N\M) distribution 
is required in order to produce a correlation function of 
the observed power-law form. (Benson et al. 2000b; Sel- 
jak 2000; Peacock & Smith 2000; Scoccimarro et al. 2001; 
Berlind & Weinberg 2002) 

6. RESULTS 

6.1. The Halo Occupation Distribution 

We start discussion of our results with the factorial mo- 
ments of the HOD defined in the previous section. Figure 4 
shows the first moment of the HOD for the halo sample 
with number density of 5.86 x 10~^/i^ Mpc~^ (Knax > 
70 kms^^). Given that the halo samples are constructed 
by simply selecting all halos with circular velocities larger 
than a threshold value, the HOD will have a trivial com- 
ponent corresponding to the host halo: 

. r _ / 1 for Mh > Minin , . 

for Mh < M„i„ ^^^> 

where M^i^ is the mass corresponding to the threshold of 
the maximum circular velocity of the sample. The first 
moment of this component is simply a step-like function 
shown in the bottom panel of Figure 4 by the dotted line. 
Note that halo samples are defined using a threshold Vmax, 
while the HOD is plotted as a function of halo mass. The 
transition from zero to unity is therefore smooth because 
certain scatter exists between Vmax and halo mass (Bullock 
et al. 2001b). We find that the scatter is approximately 
gaussian and its effect on (Nc) can be described as 

(iVe) =erf(5[l-M/Mmin]). (13) 
The second HOD component corresponds to the proba- 
bility for a halo of mass M to host a given number of sub- 
halos Ns ^ N -1: Ps{Ns\M) = P{Ns + 1|M). The first 
moment of this component is shown by the long-dashed 
line. As we noted above, this separation is equivalent to 
differentiating between central and satellite galaxies in ob- 
servations or in semi-analytic models. 

The first three moments of Ps{Ns\M) are related to the 
moments of the overall HOD as follows 

{Ns) = {N) - 1; (14) 
{NsiNs^l)) = {NiN-l))-2{N) + 2; (15) 
{NsiN, - 1){N, - 2)) = {NiN -l)iN- 2)) - 

3{N{N~1)) + 6{{N)-1116) 
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Fig. 4. — Bottom panel: the first moment of the halo occupa- 
tion distribution, as a function of host mass for the halo sample 
with number density n = 5.86 X 10~^h^Mpc~^ in the ACDMgo 
simulation at 2; = 0. The solid line shows the mean total number 
of halos including the hosts, while the long-dashed line shows the 
mean number of satellite halos. The error bars show the uncertainty 
in the mean. The dotted line shows the step function corresponding 
to the mean number of "central" halos. Note that by definition, the 
solid line is the sum of the dotted and long-dashed lines. The two 
short-dashed lines indicate the dependencies oc Mi^ and M^'*. Up- 
per panel: the parameter a = {N(N - 1))^^^/{N) for the full HOD 
(solid points) and the HOD of satellite halos (open points). The 
dotted line at a = 1 shows the case of a Poisson distribution. Note 
that the HOD becomes sub-Poisson at small host masses. However, 
the HOD of satellites remains close to Poisson down to masses an 
order of magnitude smaller than for the full HOD. Indeed, if the 
satellite HOD is Poisson, q = (1 - l/(Af)2)i/2 for the full HOD [see 
eq. (17)]. This expression is shown by the dot-dashed line, which de- 
scribes the points very well. The full HOD at small Mj^ is also well 
described by the nearest integer distribution [see cqs. (7) and (8)] 
shown by the dashed line. 



As can be seen from Figure 4, (Ng) has a simple power 
law form, while the shape of the full (N) (shown by the 
solid line) is complicated and consists of a step, a shoul- 
der, and the high-mass power-law tail. The parameter a 
plotted in the upper panel indicates that both P{N\M) 
and Ps{Ns\M) are close to Poisson at high masses and 
become sub-Poisson as the host mass approaches the min- 
imum mass of the sample. However, the satellite HOD 
can be described by the Poisson distribution down to host 
masses an order of magnitude smaller than the full HOD. 
The latter is well described by the nearest integer distribu- 
tion (see eqs. [7] and [8]) at small Mh. This result suggests 
a simple model for the HOD: every host halo contains one 
halo (itself) and a number of satellite subhalos drawn from 
a Poisson distribution whose mean is a power-law function 
of the host halo mass. 

Note that the Poisson shape of the subhalo HOD at 
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small masses implies a non-Poisson shape for the overall 
HOD, as can be seen from equations (14)-(16). For exam- 
ple, for masses where Ps{Ns\M) is Poisson, we have 
, _ {NjN ~ 1)) 1 

which shows that the shape is Poisson (a^ = 1) at high 
masses but drops to zero at low masses. The HOD thus 
starts to deviate from Poisson significantly at {N) < 4 



v2 _ 



= 0.81 for {N) = 2.3, while = for (A^) = 1. 



As 

can be seen in the upper panel of Figure 4, equation (17) 
describes a measured in the simulation very well. 

Figure 5 shows the mean of the subhalo HOD, (Ns), for 
different epochs and samples of different number densities. 
The mean is plotted as a function of mass in units of the 
minimum mass of the sample, n = Mh/A'/,nin. The figure 
shows that the mean number of subhalos as a function of 
H at different number densities is remarkably similar and 
exhibits only a weak evolution with time. The mass de- 
pendence is approximately linear {Ns)fj_ oc /i for masses 
^ > 5 (or (Ns) > 0.2). The formal linear fits to the {Ns)m 
of individual samples result in best fit slopes close to unity 
for Mh/Mmin ^ 5 with rather small deviations from the 
linear behavior. The best fit slopes for the samples of dif- 
ferent number densities at z = are 0.99±0.01, 0.92±0.03, 
0.96 ± 0.08, 1.04 ± 0.08, and 0.61 ± 0.21 in the order of the 
decreasing number density. As a function of redshift, the 
best fit slopes for the sample of n = 5.86 x 10~^/i^ Mpc~^ 
are 1.03 ± 0.01, 1.05 ± 0.02, 1.18 ± 0.05, 1.28 ± 0.04, for 
z = 0,1,3,5, respectively. 

Although {Ns) is close to the power law for most of the 
most of the mass range, the deviations from power law are 
evident at M ~ Mmin, especially at low redshifts. A more 
accurate formula describing the first moment is 

{Ns) = {M/Mi - Cf, (18) 
where Mi is normalization, defined as {Ns{Mi)) = 1, and 
C is a constant for a given redshift. Parameters Mi and C 
exhibit mild evolution with redshift. For z = 0, C !v 0.045 
or Ml /Mmin « 22. 

The slope of the high- mass tail of {N)m, is one of the 
key factors determining galaxy clustering statistics. In our 
results the asymptotic slope of {Ng) in the total mean {N) 
is reached only at relatively high masses. A linear fit at 
intermediate masses is likely to result in a shallower slope. 
Thus, for example, a power-law fit {N) oc to the full 
HOD shown in Figure 4 for (TV) > 4 gives /? = 0.87 ± 0.01, 
while the fit to {Ns) for the same range of masses gives 
(3s = 1.03 ±0.01. This may explain the smaller than unity 
slopes of the mean occupation number obtained in several 
theoretical and observational analyses (e.g., Berlind et al. 
2003; Zehavi et al. 2003). These estimates may therefore 
be underestimates of the true asymptotic high-mass slope. 

Figure 6 shows a comparison of the {Ns)m for the n = 
2.79 X IQ-'^h^ Mpc~3 halo sample in the ACDMgo and 
ACDMeo simulations at 2; = and z = 3. The subhalo 
HODs in the two runs agree well over the most of the mass 
range at both epochs. The shape of Ps{Ns\ij), therefore, is 
not sensitive to the normalization of the power spectrum. 
The systematically lower values of {Ng) at low Mh/M,nin 
in the ACDMgo simulation arc expected. Halos in this 
higher-(T8 cosmology form earlier and disruption processes 
have more time to operate and lower the number of satel- 
lites with masses comparable to that of the host. This 
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Fig. 6. — The mean number of subhalos {Ns) as a function of 
host mass for the halo samples with number density n = 2.79 X 
lO'^/i^Mpc"^ in the ACDMeo {open circles) and ACDMgo (filled 
circles) at 2 = (top panel) and z = 3 (bottom panel). The error 
bars show the uncertainty in the mean and are smaller than the 
symbols. The mean is plotted as a function of host mass in units of 
the minimum mass of the sample (see Fig. 3). The solid line in each 
panel shows the linear relation (Ns) oc Mf^. The figure shows that 
the HOD for a given Mh/Mmin is not sensitive to normalization of 
the power spectrum. 



difference can also be partly due to the limited mass reso- 
lution of the simulations. 

In Figure 7 we plot the square root of the second and 
the cube root of the third moments of the subhalo HOD 
for the samples and epochs shown in Figure 5. For com- 
parison the solid lines show the linear function {Ng) oc fi 
of the same amplitude as in the corresponding panel of 
Figure 5. Figure 7 shows that {Ns) w {Ns{Ns - 1))^/^ w 
{Ns{Ns - 1)(7V,, - 2))^/^ for // > 5, as c;xpcctcd for the 
Poisson distribution (eq. [6]). Therefore, Ps{Ns\fJ.) can be 
described by the Poisson distribution at these masses. As 
in the case of the mean, the higher moments of the subhalo 
HOD have similar shape and amplitude as a function of n 
for diflferent number densities and redshifts. 

6.2. The halo 2-point correlation function 

In this section we present the two-point correlation func- 
tions (CFs) for the halo samples used in the analysis of 
the HODs. Figure 8 shows the CFs for the sample of 
n = 5.89 X 10~'^/i'^ Mpc^-'' at z = 5, 3, 2, and 0, as well 
as their one- and two-halo components. The error bars 
shown for the CFs are the errors in the mean, estimated 
from jack-knife resampling using the eight octants of the 
simulation cube (see Weinberg et al. 2003), and they are 
dominated by the "cosmic variance" of the finite number 
of coherent structures in the simulation volume. Several 
interesting features are immediately apparent. First, at 
scales > Q.'ih~^ Mpc the CFs at all epochs can be well 
described by a power law, ^(r) = (?'/?'o) ''j with only 
mildly evolving amplitude and slope. The amplitude of 
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Fig. 5. — The mean number of subhalos (A^s) as a function of host mass for the halo samples of different number densities (symbols of 
different type) at different redshifts. The error bars show the uncertainty in the mean. The mean is plotted as a function of host mass in 
units of the minimum mass of the sample (see Fig. 3; note that, by definition, (Afg) = at Mji/Mmin = 1 and non-zero points shown at or 
below Mi,/Mi,iin = 1 are caused by binning). The mass in the x-axis is the mass within the radius corresponding to overdensity 180 with 
respect to the mean density of the universe. The number densities in units of h^Mpc"^ are indicated in the legend. The solid line in each 
panel shows the linear relation (A^s) oc M}^. The figure shows that the mean number of subhalos at different number densities is remarkably 
similar and shows only a mild evolution. The mass dependence is approximately linear (A''s) oc for masses A/h/A^min ^ 5 (or (A^s) ^ 0.2). 
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Fig. 7. — The square root of the second moment, {Ns{Ns — l))"'^'''^ (filled symbols), and the cube root of the third moment(Afs (A''^, — l)(Ns — 
2))^/^ (open symbols, except for the lowest number density which is shown by crosses) for the halo samples of different number densities 
(symbols of different type) at different redshifts. The moments are plotted as a function of host mass in units of the minimum mass of 
the sample. The number densities in units of /i^Mpc"'' are indicated in the legend. The solid line in each panel shows the linear relation 
(Ns) tx Mh of the same amplitude as the solid line in the corresponding panel of Figure 5. The figure shows that the HODs at different 
number densities and epochs are remarkably similar and are close to the Poisson distribution at Mij/Af,„in > 5 (or {Ng) > 0.2 — 0.3). 
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Fig. 8. — Evolution of the two-point correlation function in the 80h~^ Mpc simulation. The solid lines with error bars show the clustering of 
halos of fixed number density n = 5.89 X I0~'^h^ Mpc"'' at each epoch. The error-bars indicate the "jack-knife" one sigma errors, computed 
using the eight octants of the simulation cube, and are larger than the Poisson error at all scales. The dot-dashed and dashed lines show the 
corresponding one- and two-halo term contributions. The long-dashed lines show the power-law fit to the correlation functions in the range 
of r = [O.l — 8h~^ Mpc] . Although the correlation functions can be well fit by a power law at r > 0.3h~^ Mpc in each epoch, at z > the 
correlation function steepens significantly at smaller scales due to the one-halo term. For comparison, the dotted lines show the correlation 
function of the dark matter. 
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the dark matter correlation function, on tlic; other hand, 
increases with redshift reveahng strongly time-dependent 
bias. At the present-day epoch, there is a slight antib- 
ias at r < lh~^ Mpc. Interestingly, the magnitude of 
the anti-bias is considerably smaller than in the higher- 
normalization {as = 1) simulation (see Fig. 7 in Colin 
et al. (1999) as well as Fig. 9 in this paper). This is 
consistent with the picture where the anti-bias is caused 
by the halo disruption processes in high-density regions 
(Kravtsov & Klypin 1999), as groups and clusters in the 
low-normalization model form later and the disruption 
processes have less time to operate. The exclusion effect 
in the two- halo component is significant at z = 0, but 
diminishes at earlier epochs. This is due mainly to the 
systematic decrease in the minimiim mass of the sample 
for the same number density at higher z. The smaller 
minimum mass means smaller halo sizes. The smaller size 
is also due to the definition of the virial radius with re- 
spect to the mean density. Even for the same mass higher 
mean density of the Universe at higher redshifts results in 
a smaller virial radius. Smaller sizes of halos in the sam- 
ple result in smaller minimum pair separation for isolated 
objects. Thus, 2-halo term extends to smaller r. 

At z = the halo CF can be well approximated by a 
power law at all probed scales (0.1 — 10h~^ Mpc). The ap- 
proximate power-law shape is due to the relatively smooth 
transition between the two- and one-halo components of 
the CF. At higher redshifts, however, the transition is more 
pronounced and occurs at progressively smaller scales. This 
results in a significant steepening of the CF at ~ 0.3 — 
lh~^ Mpc. The halo model analysis shows that contri- 
bution of pairs in massive galaxy clusters is critical for 
a smooth transition between 1- and 2-halo contributions 
(Berlind & Weinberg 2002). At earlier epochs, clusters 
are rare or non-existent which explains a more pronounced 
transition. 

Indeed, power-law fits using the range of scales 0.1 — 
8h~^ Mpc give systematically smaller values of the scale 
radius ro and steeper slope 7 than the fits over range 
~ 0.3 — 8h~^ Mpc, as can be seen in Figure 12. All 
of the fits for the ACDMgo simulations are performed at 
r < 8h~^ Mpc, as the CF shape becomes affected at scales 
larger > O.lLbox (Colin et al. 1997; Colin et al. 1999). We 
also checked this by comparing matter correlation func- 
tions in the simulation to the model of Smith et al. (2003). 
We find that simulation results agree well with the model 
at scales < O.lLbox at z = and at < 0.2 — O.SLbox at 
higher redshifts. 

The power-law shape of the correlation function is rather 
remarkable, as it appears to result from a sum of non- 
power law components. We checked the components of the 
correlation function due to pairs of different types: central- 
satellite, satellite-satellite, and central-central pairs. The 
component CFs have a variety of shapes all deviating strongly 
from power law. Yet, the sum is close to the power law. 
This indicates that the power-law shape of the galaxy cor- 
relation function may well be a coincidence, as noted by 
Benson et al. (2000b) and Berlind & Weinberg (2002). 

Comparing the correlation functions in the ACDMgo 
and ACDMgo simulations (Figure 9), we find that the cor- 
relation functions of objects with the same number density 
are similar. This is not surprising in light of the approxi- 
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Fig. 9. — The correlation function and bias for the 11 = 
1.52x 10~^h^ Mpc~^ sample in the ACDMeo (dashed) and ACDMgo 
(solid) simulations. Top panel: The bias b(r) = ■\/^hh('')/€mm(j'). 
Bottom panel: The halo-halo correlation function in the two sim- 
ulations compared to the correlation function of the APM galaxies 
(Baugh 1996). The error-bars indicate the "jack-knife" one sigma 
errors, computed using the eight octants of the simulation cube, and 
are larger than the Poisson error at all scales. 

mate universality of the HOD demonstrated in the previ- 
ous section (see Figs. 5 and 6). Figure 9 also shows that 
the amplitude and shape of the CF at z = is in good 
agreement with that of the galaxies in the APM survey. 
As noted by Kravtsov & Klypin (1999) and Colin et al. 
(1999), the close agreement of halo and galaxy correlation 
functions indicates that the overall clustering of the galaxy 
population is determined by the distribution of their dark 
matter halos. 

Figure 10 shows a comparison of the projected correla- 
tion functions: 

'"max 

w,{r,)=2 j i{[rl + y^Y'^)dy, (19) 


in the volume- limited sample of bright, M,. < — 21, galax- 
ies (Zehavi et al. 2003) and halo samples of three represen- 
tative number densities in the ACDMgo simulation. The 
upper integration limit was set to rmax = 40/i~^ Mpc, to 
mimick the procedure used to estimated the observed CF. 
In taking the projection integral, we extrapolate the sim- 
ulated CF from Sh~^ Mpc (the largest reliable scale of the 
simulation) to large scales using the correlation function 
of dark matter predicted by the Smith et al. (2003) model 
rescaled to match the amplitude of the halo correlation 
function at 8/i~^ Mpc. Note that the SDSS galaxies have 
a number density of 1.1 x lQ~^h^ Mpc~^ and their CF 
should therefore be compared to the solid line. The fig- 
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Fig. 10. — The projected correlation function of the bright 
(Mr < —21) galaxies in the SDSS volume-limited sample (Zehavi 
et al. 2003) compared to the z = projected correlation function of 
halo samples of three different number densities indicated in the 
legend. Note that A/^ < —21 galaxies have number density of 
1.1 X 10-3/i3 Mpc"^. The fi gure shows that the correlation func- 
tions of galaxies and halos of the same number density are in good 
agreement (see text for discussion). 



ure shows that the correlation functions of galaxies and 
halos in our simulations agree remarkably well. In partic- 
ular, the steepening of the observed correlation function 
at r < lh~^ Mpc is reproduced. The difference at large 
scales is not significant given the large sample variance er- 
rors in the halo correlation function that result from the 
small size of the simulation cube (see Figures 8 and 9). 

Large galaxy redshift surveys have been used to detect 
a luminosity dependence of galaxy clustering (e.g., Guzzo 
et al. 1997; Willmer et al. 1998; Norberg et al. 2001; Zehavi 
et al. 2002). There are indications that a similar depen- 
dence exists at early epochs (e.g., Giavalisco & Dickinson 
2001). As the luminosity of galaxies is expected to be 
tightly correlated with the halo maximum circular velocity 
or mass, the luminosity-dependence of clustering should be 
reflected in the mass-dependence of halo clustering. Fig- 
ures 11 and 12 show the best-fit correlation length ro and 
slope 7 as a function of sample number density, with lower 
number densities corresponding to samples of halos with 
larger mass (i.e., larger values of threshold Knax)- Fig- 
ure 11 shows the z — results compared to the 2dF (Nor- 
berg et al. 2001) and SDSS galaxy surveys (Zehavi et al. 
2002; Budavari et al. 2003). The figure shows that the 
dependence of halo clustering on sample number density 
in our simulation is in general agreement with the SDSS 
Early Data Release results (Zehavi et al. 2002). 

The simulation points are systematically higher than 
the 2dF and Budavari et al. (2003) results. Note, how- 
ever, that the upturn in the clustering amplitude occurs 
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Fig. 11. — Top panel: the best-fit correlation scale ro as a func- 
tion of number density at the present day epoch. The results for the 
dark matter halos (open triangles) are compared to the recent mea- 
surements of galaxy clustering in the SDSS (solid circles are from 
the analysis of the Early Data Release by Zehavi et al. (2002), while 
open circles are derived from the analysis of the angular correlations 
by Budavari et al. (2003)) and 2dF surveys (solid squares; Norberg 
et al. 2002). The upper axis shows the r-band absolute magni- 
tude for the SDSS galaxies corresponding to each number density. 
The power-law fits were done over the range of scales from 0.3 to 
Sh^^ Mpc. Bottom panel: the best- fit slope of the correlation func- 
tion as a function of number density. 



at approximately the same number density, n 2 — 4 x 
10~^/i^ Mpc~'^, in the simulation and 2dF survey. The 
difference in amplitude can likely be attributed to the fact 
that 2dF galaxies are selected using a blue-band magni- 
tude, since several recent studies have shown that redder 
galaxies are clustered more strongly (e.g., Norberg et al. 
2001; Zehavi et al. 2002). In addition, the halo samples in- 
clude all objects above a threshold circular velocity, while 
most of the observational points in Figure 11 are defined 
for galaxies in (broad) luminosity ranges. 

Interestingly, all of the observational estimates indicate 
that the slope of the CF does not depend strongly on the 
luminosity. The slope of the halo CF is in agreement with 
observations for n < O.Ol/i^ Mpc~^ but becomes shallower 
for smaller mass objects. At n « 0.02/i^ Mpc^^ the slope 
for the halo sample is significantly shallower than that 
for the galaxies in the 2dF and SDSS surveys. This in- 
dicates that luminosity dependence of the CF slope may 
provide additional useful constraints on galaxy formation. 
The exercise demonstrates that both slope and correlation 
length should be compared when model predictions are 
confronted with observations. 

Mass dependence of the clustering amplitude is also 
found at earlier epochs (Fig. 12). The clustering of halos at 
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2; = 3 is in reasonable agreement with clustering of Lyman 
break galaxies (LBGs, Adelberger 2000; Adelberger et al. 
2003). The detailed comparison is complicated due to the 
often contradictory results from analyses that use differ- 
ent LBG samples and methods (for a summary of recent 
results see Bullock, Wechsler, & Somerville 2002, and § 7 
below). An important point is that, as we noted above, at 
higher redshifts the steepening of the CF at small scales 
biases results if a single power-law is fit down to small 
scales. This bias can be clearly seen in Figxuc 12, which 
shows the best-fit scale radius and slope for the power-law 
fits down to both 0.1h~'^ Mpc and 0.3/i~^ Mpc. At both 
z = 1 and z = 3, fits to smaller scales result in smaller ro 
and larger absolute values of the slope 7. 

7. DISCUSSION 

7.1. Implications for galaxy clustering 

The results presented in the previous sections show that 
the main properties of the observed clustering of galaxies 
are imprinted in the distribution of their surrounding ha- 
los. The term halo here includes both the isolated halos 
and the distinct gravitationally-bound subhalos within the 
virialized regions of larger systems. 

The shape and evolution of the two-point correlation 
function and the mass-dependence of clustering amplitude 
for these subhalos are in good agreement with constraints, 
for galaxies of similar number densities, from recent obser- 
vational surveys at a range of redshifts. In addition, the 
halo occupation distribution derived in this study for the 
subhalos is similar to the HOD obtained for the galaxies 
in semi-analytic analyses and for cold gas clumps in gas- 
dynamics simulations (Berlind et al. 2003). All of these 
are consistent with present observational constraints on 
the galaxy HOD (e.g., Zehavi et al. 2003), for galaxies of 
similar number densities; however, observational measure- 
ments of the HOD are not yet robust enough to make 
a meaningful comparison. The test of our predictions 
against real data must thus await future measurements. 

This result has several implications. First and foremost, 
it means that the formation of halos and their subsequent 
merging and dynamical evolution are the main processes 
shaping galaxy clustering. This appears to be true for the 
clustering of halo samples with maximum circular veloc- 
ities larger than a given threshold value, which were the 
focus of the present and other recent studies (Kravtsov 
& Klypin 1999; Colin et al. 1999; Neyrinck et al. 2003). 
This type of halo sample should correspond to volume- 
limited samples of galaxies with luminosities above a cer- 
tain threshold value because the maximum circular veloc- 
ity of a halo is expected to be tightly correlated with the 
luminosity of the galaxy it hosts. The caveat, of course, is 
that we can expect a considerable band-dependent scatter 
between galaxy luminosity and Knax, which needs to be 
accounted for in the model. The inclusion of scatter is rel- 
atively straightforward and future analysis will show just 
how large the effect of scatter is. 

Useful constraints on galaxy evolution and better un- 
derstanding of galaxy clustering can be obtained by more 
sophisticated analyses. For example, it would be inter- 
esting to compare the HOD of galaxies of different colors 
(e.g., Zehavi et al. 2002) with that of halos with differ- 
ent merger histories or environments, which would likely 



provide insight into the formation of galaxies of different 
types. 

One of the most interesting features of galaxy clustering 
is the approximately power-law shape of the two-point cor- 
relation function. Although departures from a power law 
have been found (Zehavi et al. 2003), they are quite small. 
This has been and still is a major puzzle in galaxy for- 
mation studies, especially in light of the strong deviations 
from the power-law behavior seen in the dark matter corre- 
lation function (Klypin et al. 1996; Jenkins et al. 1998). In 
the framework of the recently developed halo model, there 
also does not seem to be a generic way to produce a purely 
power-law CF (Berlind & Weinberg 2002). The power- law 
shape thus appears to be somewhat of a coincidence. 

Without a doubt, the fact that the (approximately) power- 
law CF observed for galaxies is reproduced with subhalo 
populations identified in simulations with the correct slope 
and amplitude at z = down to scales < lQOh~^ kpc 
can be viewed as a significant success. The power-law 
nature of the correlation fucntion is due to a relatively 
smooth transition between its one- and two-halo terms. 
At higher redshifts this transition is more pronounced and 
the CF steepens at small scales. The transition scale de- 
creases with increasing redshift, reflecting the decrease in 
the average halo size with time. The key to the puzzle of 
the power-law shape of the CF thus appears to be in un- 
derstanding the transition between the one- and two-halo 
terms. 

Interestingly, if similar large departures from a power 
law exist for real high-redshift galaxies, this may bias ob- 
servational power-law fits and explain some of the discrep- 
ancies between observational analyses. For example, as 
can be seen in Figure 12. single power-law fits to smaller 
radii result in systematically smaller values of ro and steeper 
slopes 7. If the slope is kept fixed, as is often done in anal- 
yses of high-z galaxy clustering, the derived correlation 
length may be artificially large. Using the halo model for- 
malism, Zheng (2003) recently showed that the large cor- 
relation length inferred for red galaxies at z « 3 by Daddi 
et al. (2003) can be explained by the steepening of the CF 
at small scales, as observed for subhalos in our simulation. 
Daddi et al. (2003) studied clustering of red galaxies in the 
Hubble Deep Field South. The number density and range 
of radii probed in their study are n » 7 x 10~^h^ Mpc~^ 
and ~ 0.04 — lh~^, with the statistically significant corre- 
lations detected only for r < 400hr^ kpc. All scales here 
and below are computed assuming the flat ACDM cos- 
mology adopted in this paper. Given a limited number of 
radial bins, Daddi et al. (2003) used a power-law fit to the 
angular correlation function with a fixed slope of —0.8 and 
obtained the best fit correlation length of ro « 8h~^ Mpc. 

As a comparison, for the halo sample with the number 
density of 6 x W~^h^ Mpc~^ in our simulation, a weighted 
least squares fit over the interval 40 — 400h~^ kpc with 
the slope fixed to —1.8 gives ro = 7.85 ± 0.77/i~^ Mpc, re- 
markably close to the value derived by Daddi et al. (2003). 
This, however, is simply a reflection of steepening of the 
CF at small scales. If both the slope and the correla- 
tion length are allowed to vary, best fit values for the 
same range of scales are rg = 2.04 ± 0.76/i~^ Mpc and 
7 = 2.69 ± 0.18. At the same time, the correlation func- 
tion is unity at ro » 5h~^ Mpc and the power-law fit over 
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Fig. 12. — The same as in Fig. 11 but for 2 = 1 and 2 = 3. At high redshifts the correlation function significantly steepens at small 
separations due to the one-halo correlations (see Fig. 8). This makes the results of the power-law fit sensitive to the minimum separation 
rjnin used in the fit. The figure shows results for two values r^\^: 100/i~^ kpc (solid triangles) and 300/i~^ kpc (open triangles). At z = 3 
the results are compared to the recent observational measurements of Lyman Break Galaxy clustering at 2 ~ 3. 



the interval 0.3 -8/1"^ Mpc gives tq = 4.93±0.49/i-i Mpc 
and 7 = 1.72 ± 0.13 (see Fig. 12), in very good agreement 
with the halo model calculations of Zheng (2003). 

Similar biases may explain some of the discrepant re- 
sults on the clustering of Lyman break galaxies (LBGs), 
virtually all of which have assumed a power-law correla- 
tion function, and many of which have fixed the power- 
law slope during fits. Many studies presented seemingly 
contradictory measurements of correlation lengths ranging 
from 1 to 5/i~^ Mpc (Giavalisco & Dickinson 2001; Bullock 
et al. 2002; Adelberger et al. 2003; Porciani & Giavalisco 
2002; Arnouts et al. 1999). The variety of values and dis- 
crepancies of the correlation length and slope could be 
explained by departures of the high- 2: CF from the single 
power law. Thus, for example, the power-law fit over the 
smallest angular scales in the HDF sample by Giavalisco 
& Dickinson (2001) gives the smallest value of rg and the 
steepest slope 7. The analysis of Arnouts et al. (1999) fits 
the CF over a larger range of scales with the slope fixed 
at a relatively low value and results in a larger value of rg . 
This would be expected if the CF steepens at small scales, 
as observed in our simulations. All of the studies perform 
the Limber deprojection assuming a power-law CF, which 
may further bias results. The implication is that if the 
deviations from the power law for the galaxy CF are as 
strong as indicated by our results, the assumption of the 
single power law is dangerous and is likely to bias results. 
The magnitude of the bias depends on the range of scales 
probed and the analysis method. 

On the positive side, the sharper transition from one- 
to two-halo components of the CF at early epochs means 



that departures from the power law may be easier to de- 
tect in high-redshift surveys than they are at 2: = 0. Such 
features can be useful in understanding the environments 
and nature of galaxies because the one-halo term contains 
information about P{N\M) and the radial distribution of 
galaxies in halos. For example, we can expect the distri- 
bution of red galaxies to be biased toward the high-density 
regions of groups and clusters. Their correlation function 
is therefore expected to have a more pronounced one-halo 
term. Although the sizes of samples are still relatively 
small at present, it is interesting that the projected CF of 
the largest LEG galaxy samples to date presented by Adel- 
berger et al. (2003) (their Fig. 23; see also Hamana et al. 
2003) indicates a departure from the power law similar to 
that in z = 3 panel of Fig. 8. 

7.2. The shape of the halo occupation distribution 

The main result of our study is the approximate univer- 
sality of the halo occupation distribution, Ps{Ns\^) where 
/i = M/Mjnin, for the subhalos in our samples. We show 
that the overall HOD can be split into the probability for 
a halo of mass M to host a central galaxy and its prob- 
ability to host a given number Ng of satellite galaxies, 
which significantly simplifies the halo model. The former 
can be approximated by a step function, while the lat- 
ter is well approximated by the Poisson distribution fully 
specified by its first moment (Ns) for /i > 5. The first 
moment of the distribution is well represented by a sim- 
ple power law {N) oc for /i ^ 5 with /3 close to unity 
for a wide range of number densities and redshifts. We 
also find that the form of the satellite HOD is not sen- 
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sitive to the normalization of the power spectrum. Note 
that although the amplitude of {N){fj,) changes little for 
different number densities, redshifts, and spectrum nor- 
malizations, some weak dependencies do exist. Thus, for 
example, the first moment for the lower number density 
(i.e., higher mass) samples have a systematically higher 
amplitude. Moreover, there is a factor of three increase in 
the normalization from z = to z = 5. The HOD moment 
in the lower-normalization simulation has a slightly higher 
amplitude than in the higher normalization run (Fig. 6). 

It is worth noting that results presented here for the 
dark matter halos are in good agreement with the results 
of semi-analytic and gasdynamics simulations obtained by 
Berlind et al. (2003). In particular, we checked that our 
results on the HOD of the satellite galaxies are in very 
good agreement with the results of these simulations. 

The approximately linear dependence of the first mo- 
ment on the host mass, (TV.,) oc M, is related to the 
shape of the subhalo mass and velocity functions: Ns{> 
Vsuh) = ^(V^ub/Vh)"'', where is the number of sub- 
halos within the virial radius of the host and I^ub and 
Vh are the maximum circular velocity of subhalos and the 
host, respectively. High-resolution simulations give ry « 3 
(Klypin et al. 1999; Moore et al. 1999) and normaliza- 
tion approximately independent of mass A const (Moore 
et al. 1999; Colm et al. 2003). Therefore, the number of 
subhalos with Vsuh above a certain threshold Vth scales 
as N,{> VihlAfh) = ^V^j^''!/,''. The circular velocity of 
isolated host halos is tightly correlated with their mass 
Mh = CV^ with a » 3 - 3.3 (e.g., Avila-Reese et al. 1999; 
Bullock et al. 2001b). We thus have N{> Vth\Mh) oc 
with P = r]/a ~ 1. 

The simple combination of a step function represent- 
ing the central galaxies and a Poisson distribution for the 
satellite galaxies can be compared to the sum of the HODs 
for the two, which is in general significantly more compli- 
cated, consisting of a step, a shoulder, and a power-law 
high mass tail, as observed in our simulations and in sim- 
ulations studied by Berlind et al. (2003). This shape is 
sometimes approximated simply by a power law (e.g.. Bul- 
lock et al. 2002) or by the combination of a step function 
and a power law for masses larger than some Mp (e.g., Ze- 
havi et al. 2003). However, these are only crude approx- 
imations, as the first moment of the HOD in simulations 
is almost never flat, especially at high redshifts. Mod- 
cling of the HOD as a combination of host and satellite 
HODs provides a considerably more accurate prescription 
without increasing the number of parameters. It is also 
more physically motivated because it is reasonable to ex- 
pect that the processes that control the formation of the 
central galaxy are different from those that control the 
abundance of satellite galaxies. 

Most importantly, the simple form of the satellite HOD 
hints at some simple physical processes that control the 
satellite population. Understanding these processes is well 
within the capabilities of current numerical simulations or 
semi-analytic models for satellite accretion and orbital evo- 
lution (e.g.. Bullock et al. 2000; Zentner & Bullock 2003). 
Further theoretical studies of the satellite HOD shoidd 
thus provide key clues to the understanding of small-scale 
galaxy bias and the power-law shape of the correlation 
function. 



7.3. Halo occupation distribution model 

Our results suggest a simple yet accurate model for the 
halo occupation distribution for samples selected with a 
given mass or luminosity threshold, which could be used 
in theoretical modeling and model fits to the observational 
data. The probability for a halo of mass M to host iV 
galaxies, P{N\M), is split into probability to host one cen- 
tral galaxy Pc(M) and Ng satellite galaxies Ps{Ns\M) = 
P{Ns + 1|-M). In the simplest case when halos are se- 
lected using a quantity tightly correlated with mass (e.g., 
the maximum circular velocity in our analysis), the dis- 
tribution Pc{M) can be approximated by a step function 
(eq. [12]) changing from zero to unity at Mmin, the mass 
corresponding to the threshold quantity. If selection is 
done using a quantity correlated with mass with a sig- 
nificant scatter, such as galaxy luminosity, the transition 
from zero to unity at M > Mmin will be smoother. The 
transition can be modeled to take into account the scatter 
in the mass-luminosity relation and other sample selec- 
tion effects. For example, equation (13) above shows how 
a gaussian scatter between luminosity and mass could be 
taken into account. 

The probability distribution Ps{Ns\M) is Poisson for 
M > Mmin and is defined by its first moment, given by 

(TV,) = [M/M.f, (20) 

and by higher moments, given by eq. (6). Our results 
(Figs. 5 and 7) give Mi w 30Mmi„ for z = 0, 20Mmin for 
z = 1, lOMmin for z = 3,5 and /3 « 1 for all number den- 
sities and redshifts. A more accurate formula describing 
the first moment is given by equation (18). 

To relate this HOD model to a population of galaxies 
with a known spatial number density n in a given cosmol- 
ogy we have: 

POO 

n(>M^i„)= / {N){M,M^,u)n^{M)dM 

« / [l + {MlMif]rn,{M)dM, (21) 

where nh(-M) is the theoretical mass function of host ha- 
los (e.g., Sheth & Tormen 1999). This function can be 
inverted to estimate Mmin. An approximate estimate of 
Mmin can be obtained by using the approximation to the 
subhalo-l-host mass function: n(> Mmin) ~ nh(> — 
/sub)~^, instead of the integral in eq. (21). Here, /sub ~ 
0.15 — 0.25 for masses in galactic range (see Fig. 3). 

The first moment of the HOD distribution is {N) = 
{Ns) -t- 1, while the second and third moments and higher 
moments can be specified completely in terms of moments 
oiPs{Ns\M) as 

(iV(iV - 1)) = {N,{N, - 1)) + 2(iV,) 

= (iV,)((iV,)+2), 

{N{N - 1){N - 2)) = {N,)\{N,) + 3), etc. (22) 

The HOD moments can be used to calculate clustering 
statistics in the framework of the halo model. 

The model has at most three free parameters: the min- 
imum mass Mrnin of a halo that can host a galaxy in the 
sample, and the normalization and slope of the first mo- 
ment of the satellite galaxy HOD (M/Mi)^, where Mi is 
the halo mass corresponding to the average of one satel- 
lite galaxy. The model provides more accurate description 
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of the HOD in simulations and semi-analytic models com- 
pared to other models used in the literature, without an 
increase in the number of free parameters. 

8. CONCLUSIONS 

In this study we analyze the halo occupation distri- 
bution (HOD) and two-point correlation function (CF) 
of dark matter halos using high-resolution dissipationless 
simulations of the concordance ACDM model. Our main 
conclusions can be summarized as follows. 

• Wo find that the shape of the HOD, the probability 
distribution for a halo of mass M to host a number 
of subhalos N, P{N\M), is similar to that found for 
galaxies in semi-analytic and A''-body-|- gasdynam- 
ics studies (Berlind et al. 2003). 

• The first moment of the HOD, {N)m, has a com- 
plicated shape consisting of a step, a shoulder, and 
a power-law high mass tail. The HOD can be de- 
scribed by Poisson statistics at high halo masses 
but becomes sub-Poisson for (A^) m ^ 4. We show, 
however, that this behavior can be easily under- 
stood if the overall HOD is thought of as a com- 
bination of the probability for a halo of mass M 
to host a central galaxy, Ph{M), and the probabil- 
ity to host a given number Ng of satellite galaxies, 
Ps{Ns\M). The former can be approximated by a 
step-like function, with {Nc) = 1 for > Mj^in, 
while the latter can be well approximated by the 
Poisson distribution, fully specified by its first mo- 
ment. The first moment can be well described by 
a simple power law (Ng) (x /i'^ for ij, > 5 {iJ, = 
M/Mmin) with (3 close to unity for a wide range of 
number densities and redshifts. 

• We find that the satellite HOD, Pg{Ns\^J.), has a 
similar amplitude and shape for a wide range of halo 
number densities and redshifts. It is also not sensi- 
tive to the normalization of the power spectrum for 
objects of a fixed number density. 

• We study the two-point correlation function of galac- 
tic halos at scales 0.05 — 8/i^^ Mpc. We confirm and 
extend results of our previous studies (Kravtsov & 
Klypin 1999; Colin et al. 1999) based on lower res- 
olution simulations, in which we found that 1) the 
halo correlation function can be well described by 
a power law at scales > 300/i^^ kpc at all epochs; 
2) the amplitude of the correlation function evolves 
only weakly with time; and 3) the evolution results 
in a small-scale anti-bias at the present day epoch. 

• We find that the small-scale anti-bias is consider- 
ably smaller in the low-normalization, erg = 0.75, 
simulation than in the as = 1 model. This is consis- 
tent with the picture that the anti-bias is caused by 
the halo disruption processes in clusters (Kravtsov 
& Klypin 1999), as the clusters in the low-erg model 
form later and the disruption processes have less 
time to operate. 

• The halo clustering strength depends on the max- 
imum circular velocities of the halos (and hence 



their mass). The dependence is weak for l^nax < 
200 kms~^, but becomes stronger for higher circu- 
lar velocities. The dependence of correlation length, 
To, on the number density of the halo sample is in 
general agreement with the clustering of galaxies in 
the SDSS survey. 

• We study the one- and two-halo components of the 
two-point correlation function of halos at different 

epochs. At z = 0, the transition between these com- 
ponents is relatively smooth and the CF can be well 
fit by a single power law down to « lOO/i^^ kpc. At 
higher redshifts the transition becomes more pro- 
nounced and occurs at smaller scales. The signif- 
icant departures from the power law may thus be 
easier to detect in high-redshift galaxy surveys than 
at the present epoch. These departures can be used 
to put useful constraints on the environments and 
formation of galaxies. 

• If the deviations from a power law for the galaxy 
CF are as strong as indicated by our results, the 
assumption of the single power law often used in 
observational analyses of high-redshift clustering is 
dangerous and is likely to bias the estimates of the 
correlation length and/or slope of the correlation 
function. For the halos in our samples at z = 3 the 
correlation function steepens at r ~ 300h~^ comov- 
ing kpc. There are indications that the clustering 
strength of z = 3 LBGs becomes stronger than the 
large scale power-law at a similar scale (Adelberger 
et al. 2003). 
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